Web Content Mining for Information on Information Scientists

نویسنده

  • Sarah Risse
چکیده

This paper presents a search system for information on scientists which was implemented prototypically for the area of information science, employing Web Content Mining techniques. The sources that are used in the implemented approach are online publication services and personal homepages of scientists. The system contains wrappers for querying the publication services and information extraction from their result pages, as well as methods for information extraction from homepages, which are based on heuristics concerning structure and composition of the pages. Moreover a specialised search technique for searching for personal homepages of information scientists was developed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Extraction using Content-Based Handles

In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...

متن کامل

A Technique for Improving Web Mining using Enhanced Genetic Algorithm

World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...

متن کامل

A web crawler design for data mining

The content of the web has increasingly become a focus for academic research. Computer programs are needed in order to conduct any large-scale processing of web pages, requiring the use of a web crawler at some stage in order to fetch the pages to be analysed. The processing of the text of web pages in order to extract information can be expensive in terms of processor time. Consequently a dist...

متن کامل

Web Content Mining Techniques and Tools

with the spectacularly and unpredictable growth of information available over the Internet, WWW has become a powerful platform for storage & retrieval of information. Due to the heterogeneity & unstructured nature of data on WWW, searching of information is becoming cumbersome & time consuming task. Web mining came as solution for the above problem. Web mining is the means of utilizing data min...

متن کامل

Web Mining: a Comparative Study

Currently, World-Wide Web has developed to a distributed information space with nearly 100 million workstations and several billion pages, which brings the people great trouble in finding needed information although huge amount of information available on webs. The search engine is a very important tool for people to obtain information on Internet, but the low-precision and lowrecall exist wide...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006